Objectives:

Data:

</b>Loan_modelling.csv</b> - contains customer information of AllLife Bank

Import Libraries

Read the dataset

View the dataset

Understand the shape

Check the duplications

Check the datatypes

Fixing data types

Alter the data to better fit the descriptions

Check for missing values

Dataset Summary

Zip Code Mapping

EDA

Univariate Analysis

Observations on Age

Observations on Experience

Observation on Income

Observation on Family

Observation on CCAvg

Observations on Education

Observation on Mortgage

Observations on Personal_Loan

Observations on Securities_Account

Observations on CD_Account

Observations on Online

Observations on CreditCard

Bivariate Analysis

Personal_Loan vs Age, Experience, Income, CCAvg and Mortgage

Summary of EDA

Data Description

Observations from EDA

Data Pre-processing

Outlier & Missing Value treatment

Data Engineering

Creating dummy variables

Splitting the Data

Similar distribution of independent variable in train and test sets indicates a good split. Row distribution in X indicates a 70 30 split.

Building Logistic Regression Model

Logistic Regression

Finding the coefficients

Odds from coefficients

Coefficient Interpretation

Checking model performance on training set

Model Performance improvement

Optimal threshold using AUC-ROC curve

checking new performance on training set

Checking Precision-Recall curve for better threshold

Model Performance Summary

Test set Performance

Default Threshold

ROC-AUC

Model with threshold of 0.16

Model with threshold of 0.28

Build Decision Tree Model

Checking model performance on training set

Checking model performance on testing set

Decision Tree Visualization

Decision tree importances

GridSearch tuning our tree model

Checking performance on training set

Checking performance on testing set

Cost Complexity Pruning

Recall vs Alpha for training and testing set

Comparing all the decision tree models

Insights and Recommendations